Data Exploration and Cleaning Exercise

  1. Load demo.xlsx dataset

  2. Rename the columns as suggested below

    Old name New name
    Age age
    Gender gender
    Marital Status marital_status
    Address address
    Income income
    Income Category income_category
    Job Category job_category
  3. Display all the columns in the dataset

  4. Display some basic statistics about the numeric variables in the dataset

  5. Display some basic statistics about the categorical variables in the dataset

  6. What are the unique observations under gender?

  7. Can you fix any problems observed under the gender, give brief explanations why and how

  8. How many observations have ‘no answer’ for marital status?

  9. Write some piece of code to return only numeric variables from the dataset

  10. Are there any missing values in the dataset?

  11. Are there any outliers in the income variable?

  12. Investigate the relationship between age and income

  13. How many people earn more than 300 units?

  14. What data type is the marital status?

  15. Create dummy variables for gender

Back to top